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Abstract 

We introduce the technique of decomposing an undirected graph by finding a max- 
imal set of edge-disjoint cycles. We give a parallel algorithm to find this decomposi- 
tion in O(logn) time on (m + n)/logn processors. We then use this decomposition 
to give the first efficient parallel algorithm for finding an approximation to a min- 
imum cycle cover. Our algorithm finds a cycle cover whose size is within a factor 
of 0(1 + U n+n ) of tlie minimum sized cover using 0(log 2 n) time on (m + n)/ log n 
processors. We also generalize these algorithms to weighted graphs with running 
times that are a factor of O(logC) slower than their unweighted counterparts, where 
C is the largest weight in the graph. Finally, we show how to use scaling to develop 
parallel algorithms for the assignment problem in which the number of processors 
used is independent of the magnitude of the edge costs. This leads to algorithms 
for the assignment problem that do less work than any known RNC algorithms for 
this problem. 

Portions of this thesis are joint work with Philip Klein. 
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Chapter 1 

Introduction 



Many graph- theoretic problems that are "easy" to solve sequentially turn out to be 
much more difficult to solve in parallel. For example, most linear-time sequential 
graph algorithms rely on decomposing a graph with a breadth-first search or a 
depth-first search. However, no fast and efficient parallel algorithm is known for 
either of these problems, i.e. no polylogarithmic-time algorithm is known that has 
a processor-time product even close to linear in the number of nodes and edges in 
the input graph. Thus, in order to design fast and efficient parallel algorithms for 
graph theoretic problems, we must consider new, or at least different, types of graph 
decompositions and algorithmic techniques. Examples of such decompositions that 
have proven fruitful include ear decompositions [35, 36], and Euler tours [44, 8, 34, 
18]; examples of useful general techniques include divide and conquer [32, 28] and 
dynamic programming [1]. The novel use of these decomposition and algorithmic 
techniques has led to efficient parallel algorithms for a variety of problems, and in 
some cases to improved sequential algorithms as well. In this thesis we introduce a 
new technique of decomposing a graph into a maximal set of edge-disjoint cycles and 
show how an old algorithmic technique, scaling, can be used to achieve improved 
parallel algorithms for the assignment problem. 

In Chapter 2, we introduce the technique of decomposing a graph by finding 
a maximal set of edge-disjoint cycles. Given a graph, we would like to identify a 
collection of edge-disjoint cycles whose removal renders the graph acyclic. We give 
a parallel algorithm that solves this problem for ra-node, m-edge undirected graphs 
in O(logn) time using (m + n)/logn processors of a concurrent-read concurrent- 

11 



12 CHAPTER 1. INTRODUCTION 

write parallel random access machine (CRCW PRAM) [31]. Since the problem 
requires fl(n + m) operations in the worst case, our algorithm is optimal in its use 
of parallelism. 

This technique is related to the Euler tour technique. In 1736, Euler posed 
the first graph theoretic problem, known as the Konigsberg Bridge Problem. This 
problem can be phrased in graph- theoretic terms as: given an undirected graph 
G, find a cycle that covers every edge in the graph exactly once. A cycle of this 
form is called an Eulerian cycle and a graph that contains such a cycle is called 
an Eulerian graph. Euler showed that a graph is Eulerian if and only if the graph 
is connected and every node in the graph has even degree. Moreover, there is a 
linear-time sequential algorithm which, given an Eulerian graph, finds an Eulerian 
cycle. Awerbach, Israeli, and Shiloach [8] have given a parallel algorithm to solve 
this problem in O(log n) time using n + m processors, for graphs with n nodes and 
m edges. 

As this technique has proven useful for Eulerian graphs, there has been work on 
approximating Eulerian tours in non-Eulerian graphs. The standard technique is to 
convert a non-Eulerian graph into an Eulerian graph by adding a new node v', and 
an edge between every odd degree node and v'. However, this technique does not 
always prove to be useful algorithmically. Finding a maximal set of edge-disjoint 
cycles can be viewed as an alternative method of approximating an Eulerian tour, 
as it defines a maximal Eulerian subgraph. 

We also introduce a generalization of our algorithm that handles integer- weighted 
undirected graphs as well. We think of the edge weights as edge multiplicities and 
find a set of cycles with multiplicities, whose removal renders the graph acyclic. The 
algorithm takes 0(log n log C) time on (m + n)/ log n processors of a CRCW PRAM, 
where C is the largest edge weight in the graph. We hope that this generalization 
will be of use in solving network optimization problems, as the weights allow us to 
model the notions of flow and capacity. 

In Chapter 3, we demonstrate the utility of this technique by using it to find a 
cycle cover of a biconnected graph. A cycle cover is a set of cycles such that every 
edge in the graph appears in at least one cycle. In applications such as the analysis 
of irrigation systems by the Hardy Cross method [13] and the analysis of electrical 
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circuits, it is important to find a small cycle cover. Finding a minimum cover — one 
using the fewest possible edges— is conjectured to be NP-complete [25]. We give 
the first efficient parallel approximation algorithm for this problem, as we can find 
an 0(1 + U m+n ) approximation to the minimum cycle cover in 0(log n) time on 
(m + n)/ log ra processors. We also generalize this algorithm to multigraphs using 
0(log 2 ralogC) time on (m + n)/logn processors. 

Additionally, our techniques yield a useful sequential algorithm. The sequential 
algorithm that finds the smallest cycle cover is that of Alon and Tarsi [6]; their 
algorithm guarantees a constant factor approximation. However, their algorithm 
requires 0(m + n 2 ) time. Thus by sacrificing an 0(1 + n ^" ) factor in the size of 
the cover, we obtain a faster algorithm, one that runs in 0(m + n log n) time . Note 
that for non-sparse graphs (m > nlogra), our techniques yield a cover whose size is 
within a constant factor of optimal. Further, for all classes of graphs, our algorithm 
is faster than any of the previous algorithms for finding a cycle cover [6, 25, 26]. 

In Chapter 4, we demonstrate how scaling can be used to substantially reduce 
the amount of work needed to find a minimum perfect matching in a bipartite graph. 
Karp, Upfal, and Wigderson [30], and Mulmuley, Vazirani, and Vazirani [38], have 
recently developed randomized NC (RNC) 1 parallel algorithms for the minimum 
perfect matching problem, assuming that the input is given in unary. For both of 
these algorithms, the number of processors needed is proportional to the magnitude 
of the largest edge cost. We show how to convert these algorithms into algorithms 
that use a number of processors that is independent of the magnitude of the largest 
edge cost, provided that the graph is bipartite. As a tradeoff, we get an increase in 
the time spent that is proportional to the logarithm of the magnitude of the largest 
number. If C = fi(n 1+e ), we get algorithms that do less work, where work is the 
product of the number of processors used and the time spent. Assuming similarity , 
our algorithms are in RNC. 

We achieve these results by using scaling, which reduces the problem of finding 



1 NC is the class of algorithms that, on input of size n, use n* 1 processors and 0(log 3 n) time, 
for some constants iki and Jb^. RNC algorithms are NC algorithms that allow each processor to 
generate an 0(log n) bit random number at each step in the computation. 

2 The similarity assumption is the assumption that log n = 0(log C) where C is the largest edge 
cost. See [17] for details. 
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a matching in a graph with large edge costs to the problem of finding a sequence 
of matchings in a sequence of graphs, each of which has small edge costs. Scaling 
was first introduced by Edmonds and Karp [15] and has recently been a part of 
efficient sequential algorithms for shortest paths [3, 17], maximum flow [17, 4, 5], 
minimum-cost flow [15, 23, 2, 39, 19], and matching [19, 4]. In parallel computation 
scaling has received somewhat less attention [19, 21]. Our algorithm combines ideas 
involving scaling and dual variables from the sequential algorithms of Gabow [17] 
and Gabow and Tarjan [19], with the parallel matching algorithms of Mulmuley, 
Vazirani, and Vazirani [38] and Karp, Upfal, and Wigderson [30]. 



Chapter 2 

Finding a Maximal Set of Edge 
Disjoint Cycles 



In this chapter, we introduce a new graph decomposition technique, that of decom- 
posing a graph into a maximal set of edge-disjoint cycles. Alternatively, this can 
be viewed as a set of cycles whose removal from a graph renders the graph acyclic. 
In Section 2.1, we discuss the concepts of a cycle space and a cycle basis, and show 
their relationship to a spanning forest of a graph. In Section 2.2, we use these con- 
cepts to give a parallel algorithm that finds a maximal set of edge-disjoint cycles 
in O(logn) time on (m + n)/logn processors. Finally, in Section 2.3, we use the 
algorithm of Section 2.2 as a subroutine in an algorithm that finds a maximal set of 
edge-disjoint cycles in O(lognlogC) time on a multigraph, where C is the largest 
edge multiplicity. 

This chapter is joint work with Philip Klein. 

2.1 Preliminaries 

Let G = (V, E) be an n-node, m-edge undirected graph with node set V and edge set 
E = {ei, . . . , e m }. A maximal set of edge-disjoint cycles of G = (V, E) is a set of cy- 
cles Ci , . . . , C k s.t. d C E, d n Cj = for i # j, and the graph G' = (V,E- U<C,-) 
is acyclic. Let {0, 1}^ denote the m- dimensional vector space over GF(2). Each sub- 
graph H of G corresponds to a vector n{H ) — (fii(H), . . . ,fi m (H)), where m{H) = 1 
if edge e,- appears in H , and fii(H) = otherwise. Further, for any edge e, let fi(e) 

15 



16 CHAPTER 2. FINDING A MAXIMAL SET OF EDGE DISJOINT CYCLES 

be the vector corresponding to the subgraph containing only edge e. An even-degree 
subgraph of G is a subgraph in which every node has even degree. Every connected 
component of an even-degree subgraph is Eulerian; hence, the edges of such a sub- 
graph can be decomposed into cycles. Furthermore, the following fact is well-known: 

Fact 2.1.1 Let C(G) = {n 1 , .. . ,//'} be the vectors corresponding to {Hi,..., Hi}, 
the set of all even degree subgraphs ofG. Then C(G) is a vector subspace of {0, 1} E . 

We will call this vector subspace the cycle space of G and denote it by C(G). Further, 
we will use the following simple corollary of Fact 2.1.1: 

Fact 2.1.2 For two even-degree subgraphs H\ and Hv of G, the elementwise mod 
2 sum of the vectors n{H\) and ^(#2) »'* again a vector n{H) corresponding to an 
even-degree subgraph H . 

Note that this cycle space can contain as many as 2 m vectors, and is, in this form, 
impractical algorithmically. 

Thus, we will focus on finding a cycle basis of the cycle space C(G), i.e. a set 
of linearly independent vectors that span the cycle space. Clearly, the number of 
vectors in any basis is equal to the dimension of the space, and hence is independent 
of the choice of the basis. In fact, the size of a cycle basis can be well characterized: 

Lemma 2.1.3 ([9]) LetG be a graph withp connected components. The cardinality 
of any cycle basis of G is exactly m — n + p. 

We proceed to characterize an easily found basis. 

Let F be a spanning forest of G. Let Fbea rooted version of F; that is, F is 
obtained from F by choosing a root within each tree of F. For each node v, there 
is a unique simple path P{y) from v to the root of the tree containing v. For two 
nodes v and w belonging to the same tree, the lowest common ancestor of v and w, 
denoted lca(v,w), is the first node common to P(v) and P(w). 

An edge e = (v, w) of G not appearing in F is called a cycle-edge (with respect 
to F) because F U {e} contains a unique simple cycle, denoted C(e). In particular, 
the cycle C(e) consists of e, together with the paths from r; and w to their lowest 
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common ancestor in F. Using the GF{2) vector notation, we can write the cycle 
C(e) as 

/z(C(e)) = M (e) + l*(P(v)) + n(P(w)), (2.1) 

because the portions of P(v) and P(w) from lca(v,w) to the root coincide and cancel 
each other. We can now characterize a particular basis B: 

Lemma 2.1.4 Let graph G have spanning forest F. Then B = {fi(C(e))\e £ F} is 
a cycle basis of G. 

Proof: Let ei, . . . e* be the cycle-edges of G with respect to F. Then each vector 
/i(C(e,)) in B contains exactly one cycle-edge e;; further, each cycle-edge appears 
in exactly one vector in B. Thus, this collection of vectors is linearly independent, 
and its cardinality is m — (n — p), since any spanning forest contains n — p edges. 
Thus, by Lemma 2.1.3, B is a cycle basis. ■ 

2.2 An Algorithm for Graphs 

We will use the cycle basis B as a building block for a simple algorithm for finding 
a maximal set of edge-disjoint cycles. First, we find a rooted spanning forest, F, of 
G. Second, we let B = {/x 1 . . .//} be the cycle basis defined in Lemma 2.1.4 and 
compute the subgraph H defined by 

li(H) = £ /*•". (2.2) 

Finally, we decompose H into a set of edge-disjoint cycles. The algorithm appears 
in Figure 2.1. 

To show that this is indeed a maximal set of edge- disjoint cycles, we will prove 
the following lemma: 

Lemma 2.2.1 Let H be defined by fi(H) = E^eBA 4 '- Then H is a maximal set of 
edge-disjoint cycles. 
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Input: Undirected graph G. 

Output: H, a maximal set of edge-disjoint cycles of G. 

1 Choose a rooted spanning forest F of G. 

2 Determine the subgraph H of G by (2.2). 

3 Decompose H into edge-disjoint cycles. 

Figure 2.1: Algorithm Maximal Cycles 

Proof: By repeated application of Fact 2.1.2, H is an even degree subgraph, and 
hence, can be decomposed into edge-disjoint cycles. Now we show that it is maximal. 
Let ei, . . . et be the cycle-edges of G with respect to F. Then, by equation 2.1 and 
the definition of B, 

A*(2T) = £>' = £ KC(e)) (2.3) 

where the sum is elementwise mod 2. Since each edge e not in F occurs exactly 
once in the sum, e is in the subgraph H . Thus H contains all non-forest edges and 
possibly some forest edges. All edges not in H must be in F, so G — H C F, and 
thus G — H is acyclic. ■ 

Now, we focus on the time it takes to implement algorithm Maximal Cycles. 
Step 1 can be implemented in O(log n) time using (m + n)/ log n processors by the 
spanning tree algorithm of Jung [29]. Step 3 can be implemented in O(logrc) time 
using (n + m)/logrc processors by the list-ranking algorithm of Cole and Vishkin 
[11] or that of Anderson and Miller [7]. A naive implementation of step 2 consists 
of summing at most m — n + 1 vectors of length m. This approach takes O(logn) 
time but requires about m 2 processors. We describe an alternate approach that 
takes advantage of the structure of the problem to achieve 0(log n) time using only 
(n + to)/ log n processors. 

Recall that our goal is to achieve a characterization of which edges are in 
M^O = ITu'esA*'- First observe that every edge of G that is not in the forest 
F is automatically in the subgraph H. Thus, we need only determine which edges 
of F are contained in H. We use equation 2.1 to rewrite equation 2.3 as 

„(*)= E Me) + n(P(v)) + n(P(w))}. (2.4) 

e=(v,w)£F 

Now focus on fii(H), the i th component of n{H). Because addition over GF{2) 
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split 



Figure 2.2: To obtain F 3p n t , split into two every edge not in the forest F. (The 
shaded edges are a spanning tree rooted at r.) 



is equivalent to logical exclusive-or, we can express equation 2.4 as 






(2.5) 



Let (x, y) be an edge of F corresponding to m{H) with y the parent of x. For 
any e = (v, w) £ F, /i,(e) = 0, and /*,(P(t;)) = 1 only if v is a descendant of x. 
Let 

#<*(*)= £ |{(t>,ti;) : («,«;) ^F}|. (2.6) 

v a descendant of x 

We can use equation 2.6 to rewrite equation 2.5 as 



(x,y) € H <& #d(x) = 1 mod 2. 



(2.7) 



In order to compute H, therefore, it suffices to compute #d(x) for each node x of 
G. 

We compute #d(x) by the following procedure. Let F ap ut be the rooted forest 
obtained from F and G by splitting in two each edge not in F, as illustrated in 
Figure 2.2. More formally, F sp ii t is obtained from F by adding two new nodes a^ 
and 6, and two new edges (a,-,x) and (bi,y) for each edge e,- = (x,y) € G — F. We 
refer to the original nodes of G as old nodes and the new nodes obtained by splitting 
edges as new nodes. We then use parallel tree evaluation [37, 44, 11], to compute, 
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Input: Undirected graph G with spanning forest F. 

Output: H, an even degree subgraph of G, such that G — H is acyclic. 

1 Let F sp iit be obtained from G by splitting in two each edge not in F 

2 For each node x £ G, use parallel tree evaluation to compute #n(x), the 
number of descendant new nodes of x in F sp u t . 

3 Let H = {(v,w) : (v,w) <£ F} U {(x,y) : #n(x) = 1 mod 2} 

Figure 2.3: Computing the subgraph H 

for each node x, #n(x), the number of descendants of x that are new nodes. This 
implementation of Step 2 appears in Figure 2.3. 
The following lemma justifies this procedure: 

Lemma 2.2.2 Let #n(x) be the number of descendant new nodes of x in F sp u t , 
and let #d(x) be defined by equation 2.6. Then #n(x) = #d(x), where #d(x) is 
computed in G. Further, #n(x) can be computed in 0(log n) time on (m + n)/ log n 
processors. 

Proof: In the rooted forest F sp ut, each of the new nodes a,- and 6, is a leaf. 
It is easy to see that the number of new nodes connected to an old node v is 
\{(v, w) : (v, w) £ F}\. Hence for each node x of G, the number of descendants of 
x in F 3p nt that are new nodes is exactly #d(x). Thus to compute #d(x), it suffices 
to compute #n(x) in F sp n t . Further, observe that the number of nodes in F sp n t , 
including new nodes, is at most n + 2m. Thus, using parallel tree evaluation, we 
can compute #n(x) for all old nodes x of G simultaneously, in O(logn) time using 
(m + n)l log n processors. 

Combining all the above results, we get the following theorem: 

Theorem 2.2.3 Algorithm Maximal Cycles, with Step 2 implemented as in Figure 
2.3 finds a maximal set of edge-disjoint cycles in an undirected graph in O(logn) 
time on (n + m)/logn processors. 

Proof: Immediate from Lemmas 2.2.1 and 2.2.2 and the fact that computing #n(x) 
is the dominant step in computing subgraph H. ■ 
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2.3 An Algorithm for Multigraphs 

In order to model the notions of flow and capacity that arise in network optimiza- 
tion problems, we consider multigraphs, graphs in which there may be many edges 
connecting a given pair of nodes. A multigraph can be succinctly represented as 
G = (V,E,m), where G' — (V,E) is an ordinary (simple) graph, and m : E — ► Z + 
assigns a non-negative multiplicity to each edge in E. Thus for each edge e = (x,y) 
of the simple graph G', there are m(e) edges with the same endpoints x and y in 
the multigraph G. Since the multiplicity of an edge changes over the course of the 
algorithm, we will use mo(«) to denote the multiplicity of edge e in the input graph, 
and m(e) to denote the current multiplicity of edge e. Let M be the maximum 
multiplicity of any edge in the multigraph G. 

We shall give a parallel algorithm to find a maximal collection of edge-disjoint 
cycles in a multigraph G, where no cycle is permitted to contain more than one edge 
with the same endpoints. Observe that if M is very large, we might have to remove 
an enormous number of cycles from G in order to render G acyclic. Therefore, the 
output of the algorithm shall consist of a collection of pairs of the form (C, m), where 
C is a cycle and m is taken to be the multiplicity of the cycle. Such a collection 
is a solution to the maximal edge-disjoint cycles problem for G if the following two 
conditions are satisfied: 

M.l: For every edge e of G', the sum of the multiplicities of cycles C in which e 
occurs is at most the original multiplicity of e in G. 

M.2: The set of edges e for which the above inequality is strict form an acyclic 
subgraph of G'. 

Our parallel algorithm to find such a solution takes O(lognlogM) time using 
(m + n)l log n processors. 

Define ro(C), the multiplicity of cycle C, to be min e€ c{"i(e)}, and define the 
maximum cycle multiplicity of a multigraph to be the maximum multiplicity of any 
cycle in G, i.e. maxcec fn(C), where C is the set of all cycles in G' . Our general 
approach will be to remove cycles of high multiplicity. Our algorithm ensures that 
the maximum cycle multiplicity is a non-increasing function of time. More specifi- 
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Input: Multigraph G = (V, E, mo). 

Output: S, a maximal set of edge- disjoint cycles. 

1 Let A «- 2^8(^+1)1 . Let S ♦- 0. Let m «- m . 

2 While A > 1 

3 Let G^ be the graph induced on the edges of G' with m(e) > A/2. 

4 Let F be a spanning forest of G A in which appear all edges of multiplicity 
at least A. 

5 Using F, find a maximal set of edge-disjoint cycles in G A , ignoring multi- 
plicities. 

6 For each cycle C found in step 5, 

7 Assign multiplicity A/2 to C, and add it to S. 

8 For each edge e 6 C, let m(e) <— m(e) - A/2. 

9 Let A <- A/2. 

10 Output the set S of cycles with multiplicities. 

Figure 2.4: Algorithm Maximal Capacitated Cycles 

cally, we will show that a routine similar to Maximal Cycles, applied to the proper 
graph, will decrease the maximum cycle multiplicity by a factor of 2. Assuming all 
initial multiplicities are integer- valued, only log M iterations are needed. 

The algorithm maintains a variable A such that A is a strict upper bound on the 
maximum cycle multiplicity. In one iteration of the algorithm we consider only the 
edges of multiplicity at least A/2. Furthermore, we force all edges of multiplicity 
at least A to be in the spanning tree. Once we have ensured that we satisfy these 
constraints, we find a maximal set of edge-disjoint cycles in the resulting graph, 
assign each of these cycles a multiplicity of A/2, and then remove these cycles 
from G, adding them to S. The algorithm, Maximal Capacitated Cycles, appears 
in Figure 2.4. When an iteration terminates, S contains a collection of cycles with 
multiplicities. To show this collection is maximal, we first prove the following lemma: 

Lemma 2.3.1 At each iteration in algorithm Maximal Capacitated Cycles, step 4 
succeeds, and the maximum cycle multiplicity is less than A. 

Proof: The proof is by induction on the number of iterations. The basis is trivial 
because initially A exceeds every edge-multiplicity. Suppose that after i iterations, 
the lemma holds, and the loop has not terminated. We will prove the lemma holds 
after the i + 1st iteration. Since there is no cycle of multiplicity A, the edges of 
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multiplicity at least A form an acyclic subgraph of G. Hence a spanning forest F 
containing every such edge can be constructed in Step 4. Consider the non-tree edges 
with multiplicity at least A/2. Since they are not in the forest, these edges each 
have multiplicity less than A. In steps 5 through 8, we reduce the multiplicity of 
each of these edges by A/2. Hence after step 8, every non-tree edge has multiplicity 
less than A/2. Thus, the edges of multiplicity greater than or equal to A/2 form 
an acyclic subgraph, and after A is halved in step 9, the lemma still holds. ■ 
Given this lemma, we can prove the following theorem: 

Theorem 2.3.2 Algorithm Maximal Capacitated Cycles finds a maximal set of cy- 
cles in an n-node undirected multigraph G in 0(log nlogM) time on (m + ra)/logra 
processors. 

Proof: First we show that when the algorithm terminates, 5 is a maximal set of 
edge-disjoint cycles. It is easy to see that condition M.l is maintained throughout 
the algorithm. Thus, when the algorithm terminates, for each edge e, the current 
value of m(e) is mo(e) — 5Z{C:eeC} m (C)> where m(C) is the multiplicity of cycle C 
in S, and m(e) > 0. Furthermore, since A < 1, by Lemma 2.3.1 there are no cycles 
of multiplicity greater than or equal to one and the graph of edges with m(e) > is 
acyclic. Thus we have satisfied condition M.2. 

Now we show that we can achieve the stated resource bounds. Since A is initially 
no bigger than 2(M + 1), and never decreases below 1, there are only O(logM) 
iterations. We claim that each iteration can be carried out in O(logn) time on 
n -+- m processors. It is easy to see that each step, except for steps 4 and 5, takes 
constant time onn + m processors and hence takes O(log n) time on (n + m)/ log n 
processors. By looking at the spanning tree algorithm of [29], it is easy to see that 
given an acyclic set of edges that have to be in the spanning tree, the problem 
only becomes easier, thus steps 4 and 5 can be done in the same time as algorithm 
Maximal Cycles. ■ 

We conclude with three observations. First, assuming similarity, i.e. M = 0(n ) 
for some constant k [17], this algorithm runs in 0(log 2 ra) time on (m + n)/logn 
processors. Second, observe that when we removed a cycle we always assigned it 
a multiplicity of A/2. However, we could in general assign it a higher multiplicity, 
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namely that of the minimum-multiplicity edge in it. Since this could only serve to 
reduce the maximum cycle multiplicity at an even faster rate, the bounds above 
still hold. Finally, while this algorithm has been presented as an algorithm on 
multigraphs, it could also be viewed as an algorithm that works on graphs with 
weights or capacities on the edges, where the weights or capacities correspond to 
the edge multiplicities. 



Chapter 3 

Approximating the Minimum 
Cycle Cover 



In this chapter, we develop algorithms for approximating the minimum cycle cover. 
A cycle cover of a graph is a set of cycles in which each edge of the graph is in 
at least one cycle. The minimum cycle cover is a cover that uses the fewest edges 
possible. We give a series of parallel algorithms, each of which finds a smaller cover 
than the previous one. In Section 3.1, after giving the history of the problem, we 
give a simple algorithm that finds a cover of size 0(m + n 2 ). In Section 3.2, we give 
a more involved algorithm that finds a cover of size O(mlogrc), and in Section 3.3, 
we improve the size of the cover to be 0(m + ralogn). We conclude by observing 
that these techniques also yield an efficient sequential algorithm. 

3.1 Preliminaries, Background, and a First Algorithm 

Let C = {Ci, . . . ,Cfc} be a set of simple cycles in the undirected graph G = (V,E), 
and let E(C) be the set of edges contained in C. More generally, we denote the set 
of edges in some subgraph S by E(S). We say that C is a cycle cover of the graph 
G = (V, E) if E(C) = E, i.e. every edge is in at least one cycle in the set C. We 
define the size of a cycle cover to be the total number of edges in the cycles that 
constitute that cover, i.e. \C\ = J2dec \E(C{)\. The minimum cycle cover is the 
cycle cover for which \C\ is minimized. 

Sequential algorithms for finding a cycle cover have been developed with two 
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goals in mind. The first goal is to find a cover of small size, and the second is to 
get an algorithm that runs quickly. The first algorithm for this problem, by Itai 
and Rodeh [26], finds a cover of size 0(m + nlogn) in 0(n 3 ) time. Subsequently, 
Itai, Lipton, Papadimitriou and Rodeh [25] showed that every graph has a cover 
of size min{3m — 6,ro + 6n — 7} and that this cover can be found in 0(n 2 ) time. 
This result relies on a result of Jaeger [27] that shows that every biconnected graph 
has a nowhere zero flow modulo 8, and results of Tarjan [43] and Shiloach [42] that 
find edge-disjoint branchings. They also conjecture that finding the minimum cycle 
cover is TVP-complete. Alon and Tarsi [6] have developed an algorithm that finds 
a smaller cover, one of size at most min{|m, m + |n — |}, and runs in 0(m + n ) 
time. This result relies on a proof by Seymour [41] that every biconnected graph 
has a nowhere zero flow modulo 6. Alon and Tarsi also note that a certain graph 
called the Peterson graph [26, 25] has 15 edges and no cycle cover of size less than 
21. This graph can be generalized to show that there exists an infinite family of 
graphs of m edges that have a minimum cycle cover of size at least |m. 

In this section, we consider the problem of finding a cycle cover in parallel. We 
first note that all the sequential algorithms mentioned above rely on edge-disjoint 
branchings and nowhere zero flows. All algorithms known to us for these problems 
require the computation of fi(n) maximum flows on graphs with polynomial bounded 
capacities. Even if this sequence of computations could be efficiently parallelized, 
the best known NC algorithm for computing one maximum flow in a graph with 
polynomial bounded capacities uses many processors and randomness [30]. 

Thus, we focus on a different strategy that is based on using the algorithm 
Maximal Cycles as a subroutine. First observe that the output of this algorithm is 
a set of cycles C such that m - n + 1 < \E(C)\ < m. Thus, we already know how 
to find a set of cycles that cover all but n — 1 or fewer of the edges. We could then 
cover each of the remaining edges with a cycle using (m + n)/ log n processors per 
edge in O(logn) time, yielding a cycle cover of size m + n(n — 1) using O(logn) 
time on n(m + n) processors. This gives the fastest parallel algorithm to find any 
type of non-trivial cycle cover. However, the size of the cover and the number of 
processors used are too large to be of practical interest; we will focus on finding a 
cover of smaller size. 
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Input: Biconnected graph G = (V, E). 
Output: C, a cycle cover of size 0(m log n). 

1 Initialize C to empty. 

2 While E{C) ^ E 

3 Let G c = (V,E(C)). 

4 Find F, a spanning forest of Gc • 

5 Find T, a spanning tree of G containing all edges in F. 

6 Find AC = {C\, ... ,Ck}, a maximal set of edge-disjoint cycles in G covering 
all edges not in T. 

7 Add the cycles in AC to C. 

Figure 3.1: Algorithm Cycle Cover 
3.2 Finding a Cover of Size 0(ra log n) 

In this section we will use the algorithm Maximal Cycles to develop an algorithm 
that finds a cycle cover of size 0(m log n). As in algorithm Maximal Capacitated 
Cycles, we will utilize our freedom to choose which edges to put in the spanning tree. 
Our algorithm will proceed in iterations; in each iteration we will use the algorithm 
Maximal Cycles to choose a set of cycles to add to our collection C. In general, 
during iteration i we will prefer to put in the spanning tree edges that were put into 
C in some iteration previous to i. Thus, we will force as many uncovered edges as 
possible to be non-tree edges. This means that they will be included in some cycle 
during iteration i and hence added to C. 

To achieve this, we will first find a spanning forest of the graph induced by 
E(C), the edges already covered. Next, we extend this to a spanning tree of G. 
Given this tree, we use the algorithm Maximal Cycles to find a maximal set of 
edge- disjoint cycles in G, which we then add to C. The algorithm appears in Figure 
3.1. Informally, this strategy achieves our goal of putting as few uncovered edges as 
possible in the spanning tree; we will now proceed to show this more rigorously. 

The key to the analysis of the algorithm is to show that the number of iterations 
is O(log n). At the beginning of each iteration, we have a graph G in which a set 
E(C) of edges are covered. We would like to be able to show that in each iteration, 
a constant fraction of the uncovered edges become covered. As we will show, if all 
nodes have degree three or more, this is true. However, if a graph has nodes of 
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degree two, it is not necessarily the case that a constant fraction of the uncovered 
edges become covered. However, by defining progress in terms of an auxiliary graph 
in which every node has degree three or more, we obtain the desired result. 

We derive an auxiliary graph H = H(G, E{C)) from G and the covered edges 
such that each iteration reduces the number of edges of if by a constant fraction. 
To obtain H from G, first contract all the covered edges E(C), then splice out all 
nodes of degree two. (To "splice out" a node of degree two is to contract one of its 
incident edges.) 

Lemma 3.2.1 Let C be the cycle cover at the beginning of some iteration of algo- 
rithm Cycle Cover, and let AC be the collection of edge-disjoint cycles found in step 
6. Then either H(G,E(C U AC)) has at most two-thirds the edges of H(G,E(C)), 
or the algorithm terminates immediately. 

Proof: Let Th be the spanning tree of H = H (G, E(C)) obtained from the spanning 
tree T of G by contraction: contract each edge of T that was contracted in obtaining 
H from G. Every non-tree edge in H with respect to Th is a non-tree edge in G 
with respect to T, so the edges covered by the cycle collection AC found in step 6 
include all non-tree edges of H. There are two cases to consider: 
Case 1: (H has at least one edge.) Let nfj be the number of nodes in H. Since 
every node of H has degree at least three, H has at least §n# edges. The number 
of tree-edges in H is n# — 1, hence the number of nontree edges of H is more 
than one-third the number of edges of H. Thus the graph H' obtained from H by 
contracting nontree edges has fewer than two-thirds the edges of H . But the graph 
H(G,E(C U AC)) is obtainable from H' by contractions, and so has no more edges 
than H'. 

Case 2: (H has no edges) Since contraction preserves connectivity H must consist 
of a single isolated node. Then the graph G', obtained from G by contracting edges 
covered by C, is biconnected with degree at most two, and hence is a simple cycle 
(or a single isolated node). In this case, we claim that the collection AC of step 
6 covers all as-yet-uncovered edges of G, and hence that the algorithm terminates. 
To prove the claim, simply contract the edges of AC that are already in C. The 
resulting cycle collection AC' is a subgraph of G'\ since G' consists of a simple cycle, 
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AC must include every edge of G'. ■ 

Given this lemma, we can prove the following theorem: 

Theorem 3.2.2 Algorithm Cycle Cover finds a cycle cover of size O(mlogn) in 
O(log 2 n) time using (m + n)/logn processors. 

Proof: Each iteration of the while loop finds a cycle cover of size at most m. 
By Lemma 3.2.1, there are O(logn) iterations and each iteration is the algorithm 
Maximal Cycles with some restrictions place on the choice of the spanning tree. ■ 
We note that in practice, we would change Step 7 to include a cycle AC only if 
it contained some edge that was not already in C. This could be checked in O(log n) 
time on (m + n)/ log n processors using pointer jumping [10]. Also, we could replace 
Steps 3, 4, and 5 with the computation of a minimum spanning tree of G with the 
edges in E(C) weighted with and the rest of the edges weighted with 1. 

3.3 Finding a Cover of Size 0(m + nlogn) 

In this section, we show how to decrease the size of the cycle cover from O(mlogn) 
to 0(m + nlogn) using no additional resources. Observe that the first iteration of 
the algorithm finds a cover of size no less than m — n + 1. Thus the number of 
edges not in C is at most n — 1. Assume that we can form a biconnected graph B 
that contains all the uncovered edges, and has at most 2n edges. We can run the 
algorithm Cycle Cover on B and obtain a cover of size O(nlogn). Then we can 
combine this cover with the set of cycles obtained from the first iteration of the 
algorithm on G, and obtain a cycle cover of size 0(m + n log n), using no additional 
resources. It is known how to find such a graph B in linear-time sequentially [25], 
but this requires using depth-first search. We present a parallel algorithm that does 
not use depth-first search and finds a graph B in O(log n) time using (m + n)/ log n 
processors. 

First, for each node v, we compute level(v), its distance from the root in some 
rooted spanning tree T. Then, for each non-tree edge (v,w), we compute lca(v, w). 
Finally, we choose the edges of B by including the spanning tree T, and, for each 
each node v, the edge (v,w), where w is the node that minimizes level(lca(v,w)). 
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Input: A biconnected graph G and a rooted spanning tree T. 
Output: A biconnected graph B = (V,E B ) s.t. f C B and \E B \ < 2n. 

1 Vu G V, compute level(v), the distance from v to the root of T. 

2 V(v,w) £ T, compute lca(v,w). 

3 Vu G V, let N(v) be the non-tree neighbor w minimizing level(lca(v,w)). 

4 Let E B = T U {(», JV(w)) : t; G V}. 

Figure 3.2: Algorithm Sparse Biconnected Subgraph 
This algorithm, Sparse Biconnected Subgraph, appears in Figure 3.2. 

Lemma 3.3.1 Given a biconnected graph G and a spanning tree T, algorithm 
Sparse Biconnected Subgraph computes a biconnected graph B = (V, E B ) s.t. T C B 
and \E B \ < 2n - 1. 

Proof: Since E B contains the n — 1 tree edges and at most one additional edge per 
node, \E B \ < 2n — 1. Further, since B contains a spanning tree, B is connected. 
Now we will argue that B is biconnected. Assume that B is not biconnected. This 
implies that there exists a node v' that is an articulation point of B, i.e. there 
is no edge from a descendant of v' to an ancestor of v' (descendant and ancestor 
are defined with respect to t ). This implies that for each descendant x of v' in 
G, the y that minimizes level(lca(x, y)) is also a descendant of v'. But if the y 
that minimizes level(lca(x,y)) is a descendant of x, then all y such that (x,y) is an 
edge must also be descendants of v'. This implies that v' is an articulation point 
of G, which contradicts the assumption that G is biconnected. Thus, B must be 
biconnected. ■ 

Combining Lemma 3.3.1 with Theorem 3.2.2 we get the main result of this 
section. 

Theorem 3.3.2 Combining algorithms Cycle Cover and Sparse Biconnected Sub- 
graph yields an algorithm that finds a cycle cover of size 0{m + nlogn) using 
0(log 2 n) time on (m + n)/logra processors . 

Proof: The bound follows from the previous results and the results of [40] that 
show how to compute lea and level in the stated time bounds. ■ 

Our parallel algorithm translates into an efficient sequential algorithm. 
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Theorem 3.3.3 A sequential implementation of algorithms Cycle Cover and Sparse 
Biconnected Subgraph finds a cycle cover of size 0(m + nlogn) in sequential time 
0{m + ralogn). 

Proof: The first maximal set of edge- disjoint cycles can be computed in 0(m + n) 
time by algorithm Maximal Cycles. We then find a sparse biconnected subgraph 
in 0{n + m) time. Finally we run algorithm Cycle Cover on the sparse graph in 
0(n log n) time. ■ 

The cover we find is within a factor of 0(1 + n ^^ ) of optimal. Thus for graphs 
with m > nlogn, the size of the cover is within constant factor of optimal, and the 
algorithm is faster than any of the previous algorithms. 

We conclude by observing the algorithm generalizes to multigraphs in the natural 
way, by replacing the algorithm of Section 2.2 with the algorithm of Section 2.3, as 
is summarized in the following theorem: 

Theorem 3.3.4 Let G = (V,E,c) be a multigraph with maximum edge multiplic- 
ity C. Then, algorithm Cycle Cover with algorithm Maximal Cycles replaced by 
Maximal Capacitated Cycles finds a cycle cover of size 0{mC log n) and runs in 
0(log 2 n log C) time on (m + n)/logn processors. 

Proof: The running time follows from Theorems 3.2.2 and 2.3.2. The size of the 
cycle cover is 0(mC log re) because there are at most mC edges in the graph and 
O(logn) iterations of Maximal Capacitated Cycles. ■ 

Note that we did not use a reduction to a sparse biconnected graph, as it is not 
clear, in this case, what exactly a sparse biconnected multigraph is. 



Chapter 4 

Finding an Assignment while 
Doing Less Work 



In this chapter, we show how to use scaling to reduce the total amount of work 
needed to find a minimum perfect matching in a bipartite graph. Our techniques 
convert algorithms that use a number of processors dependent on the magnitude of 
the largest cost in the graph into algorithms that use a number of processors that 
is independent of the edge costs. 

4.1 Preliminaries 

Let G = (V, E, c) be a graph with node set V, edge set E, and an integral cost c(v, w) 
associated with each edge (v,w). The edges of a graph may be either undirected or 
directed. In the former case, we will denote an edge between node v and node w 
by (u, w), while in the latter case, we will denote an edge from node v to node w 
as [u,it>]. A graph is bipartite if the nodes can be divided into two sets V\ and V2 
such that V = V\ U V?, V\ n V-j = 0, and all edges have one endpoint in V\ and one 
endpoint in V^. 

A matching on a graph is a set M of edges, such that each node is incident to 
no more than one edge from M. A perfect matching is a matching in which every 
node is incident to exactly one matched edge. If the edges have costs, the cost 
of a matching is the sum of the costs of the edges in the matching. A minimum 
perfect matching (MPM) is the perfect matching with the smallest possible cost. In 
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a bipartite graph, an MPM is also called an assignment. It will be convenient to 
associate an integer- valued dual variable d(v) with each node v. This allows us to 
define c(v,w), the reduced cost of edge (v,w), with respect to dual variables d by 
Z(v,w) = c(v,w) — d(v) — d(w). Let M be a matching. We say that a set of dual 
variables is tight if 

(v,w) e M => c(v, w)<0 (4.1) 

(v,w)£M => T(v,w)>0. (4.2) 

We say that a set of dual variables is zero-tight if 

(v,w)eM => c-(v,w) = (4.3) 

(v,w) <£ M => c~(v,w)>0. (4.4) 

We define the work done by an algorithm as the product of the number of 
processors used and the time spent. 

The first algorithm for the assignment problem is Kuhn's Hungarian algorithm 
[33]. Implemented with Fibonacci heaps [16], this algorithm runs in 0(nm+n 2 log n) 
time, which remains the best known strongly polynomial algorithm for the assign- 
ment problem. Using ideas from this algorithm, the cardinality matching algorithm 
of Hop croft and Karp [24], and scaling, Gabow and Tarjan [19] have developed an al- 
gorithm that runs in 0(y/nmlog(nC)) time. There are no known NC algorithms for 
the assignment problem; however, there are RNC algorithms under the assumption 
that the input is given in unary. The first RNC algorithm under this assumption 
was given by Karp, Upfal, and Wigderson [30]. An implementation of this algo- 
rithm by Galil and Pan [20] uses (n + C')M(n) processors and 0(lognlog 2 (nC")) 
time where C is an upper bound on the maximum cost of any matching, and M(n) 
is the minimum number of processors needed to multiply two n x n matrices. Cur- 
rently M(n) = 0(n 2376 ) [12], and trivially, M(n) = ft(n 2 ). Subsequently, a faster 
algorithm was discovered by Mulmuley, Vazirani, and Vazirani [38], that finds an 
assignment in O(log 2 n) time using nmCM(n) processors, where C is the largest 
edge cost in the input graph. As neither one of these algorithms does less work 
than the other on all graphs, we will give our improvements relative to both of these 
algorithms. 
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Input: G = (V, E, w) and a perfect matching MCE. 
Output: Zero-tight dual variables. 

1 Let G' = (V, E', c') be the directed graph with 

•V' = VU{s} 

•E' = {[s, v] | v e Vi} U {[v, w] | (v, w) £ M} U {[w, v] | (v, w) € M } 

v = s 

•c'[v,w] = < c(v,w) v ^ s and (v, w) £ M 
— c(u, w) v ^ s and (t>, w) € M . 

2 <5(v) <— shortest path distance from s to v in G'. 

3 Kv€V l ,6(v)*--6(v). 

A ( \ - I ° w G Fl 

€^; - | _^,^ ^ _ ^ y/ ^ _ ^^^ we v 2 and (v', w) € M. 

5 Return ^ + e. 

Figure 4.1: Procedure Compute Zero-Tight Duals (V,E,w,M) 

4.2 Computing Zero-Tight Dual Variables 

In this section, we address the problem of finding zero-tight duals given a matching. 
This can be done by the following procedure. First we create a directed graph by 
directing all the unmatched edges from Vi to V2 and all the matched edges from V2 
to V\ . We give the matched edges negative costs and the unmatched edges positive 
costs. Further, we add an extra node s, with edges of cost from s to every node 
in V\. Now, we compute the shortest path distance from s to every node v, and let 
these distances be the dual variables. To convert these distances to zero-tight duals, 
we add, to each dual variable in V2, the reduced cost of its associated matched edge. 
The details appear in Figure 4.1. 

Lemma 4.2.1 The dual variables computed by procedure Compute Zero- Tight Du- 
als are zero-tight with respect to matching M in graph G' . 

Proof: First note that because M is a MPM, G' has no negative cycles, so the 
shortest path distances are well-defined. Now, we will show that the dual variables 
S computed in Steps 1, 2, and 3 are tight with respect to c. By the shortest path 
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inequality and step 3, 

(v,w) g M =>■ S(w) < c(v,w) + (S(v)) =>c(v,w) - 6(v) - 6(w) > 0. (4.5) 
Similarly, 

(v,w) € M =>• -S(v) < -c(v,w) + 6(w) => c(u,u;) - <5(u) - 6(w) < 0, (4.6) 

so equations 4.1 and 4.2 are satisfied. Now consider the reduced costs of the edges 
with respect to dual variables 6 + e. Because the e(w)'s are either the reduced costs 
of matched edges or they are always non-positive. Thus, if (v,w) & M, its new 
reduced cost is 

c(v,w) — (S(v) + e(v)) — (S(w) + e(w)) 
= c(v, w) — 6(v) - S(w) — (e(v) + e(w)) 

> Z(v, w) — S(v) — 6(w) (because e(u) < Vi) 

> by equation 4.5. 

If (v, w) € M, then its new reduced cost is 

Z(v, w) - 6(v) - (6(w) - (s(v, w) - 6(v) - 6(w))) = 0. 
Thus, the dual variables satisfy conditions 4.3 and 4.4, and are indeed zero- tight. ■ 

4.3 A Scaling Algorithm 

Now we give the complete algorithm that combines scaling, the procedure Compute 
Zero-Tight Duals, and a subroutine for computing an MPM. The algorithm proceeds 
in O(logn) iterations. At the beginning of each iteration, one bit is added to the 
costs and dual variables. Then a perfect matching and zero-tight dual variables are 
found on the graph with edges of reduced cost no greater than 2n. The new dual 
variables are added to the old ones and the iteration terminates. The key to the 
efficiency of this algorithm is step 4, where we ignore edges with reduced cost greater 
than 2n. It remains to be shown that this does not change the value of the MPM. 
The details of the algorithm appear in Figure 4.2. 
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Input: G — (V, E, cq) an undirected bipartite graph with bipartition Vi and V^ and 
cost co(v,w) on edge (v,w). Assume that G contains a perfect matching. 
Output: A minimum perfect matching M. 

i ««-{* ve % 

K ' \ v G V 2 . 
c(v,w) <- V(v, w) G E. 
Let C = max( v>u; ) e£ ;{|co(r,ty)|}. 

2 For / = 1 to [log 2 C] 

v ; \ 2d(v) v G Vi. 

c(v,w) <— 2c(v,w) + (the I th signed bit of c (v, w)) V(u,ttf) G E. 

4 Let E' = {(v, w) | (u, w) G E and Z(», w) < 2re}. 

5 Compute M, a MPM in G' = (V, E', Z). 

6 A «- Compute Zero-Tight Duals (V,E',Z,M). 

7 d(v) ♦- d(t>) + AO) to G V. 

8 Output M, a minimum perfect matching. 

Figure 4.2: Algorithm Assignment (V,E,co). Letting d be non-integral is simply 
for ease of presentation. Note that d immediately becomes integral in step 3 and 
remains integral throughout the remainder of the algorithm. 
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Lemma 4.3.1 The graph G — (V, E, w) formed in step 4 of algorithm Assign- 
ment always contains a MPM of total cost no more than In. Further, \/(v,w) 6 E, 
c(v,w) > 0. 

Proof: We will prove this by induction on the number of iterations of the loop in 
step 2. During the first iteration, all costs are either or 1, so the lemma is true. 
Assume that it is true after step 4 on iteration / — 1. Then, by Lemma 4.2.1, the 
costs c are zero-tight with respect to A. Because 

c(v,w) — A(v) — A(w) 
= (c(v, w) - d(v) - d(w)) - A(v) - A(w) 
= (c(v, w) - (d(v) + A(u)) - (d(w) + A(w)) 

it is also true that c is zero-tight with respect to d + A. 

Thus, in the graph G = (V, E, c), the reduced cost of M with respect to d + A 
is and the reduced cost of every edge is non-negative. So at the start of iteration 
/, all the edges have non-negative reduced cost and the MPM has reduced cost 0. 
Now consider the effect of adding a new bit of cost and updating the dual variables 
in Step 3. Let the subscripts old and new refer to the old and new values of the 
variables. 

c new (v,w) - d new (v) - d new (w) 
= 2c old (v,w) + (-1 or or 1) - (2d old (v) - 1) - 2d M (w) 
= 2(coi d (v, w) - d old (v) - d old (w)) + (0 or 1 or 2) 

Letting c(v, w) be the reduced cost of edge (v, w) we conclude that 

2c [ d (v, w) < c new (v, w) < 2c id(v, w) + 2 V(u, w) € E. (4.7) 

Since we have just shown that all the old reduced costs are positive, it is clear 
that all the new reduced costs are also positive. Now consider the new cost of the 
matching from the previous iteration. Using the second inequality in equation 4.7, 
we see that 

53 c new (v,w)< 53 2c old (v,w) + 2< 53 2<2n. 

(v,w)£M (v,w)£M (v,w)&M 



38 CHAPTER 4. FINDING AN ASSIGNMENT WHILE DOING LESS WORK 

Thus the condition is satisfied after Step 4 in iteration /, and the induction holds. 
■ 

From this lemma we conclude that no edge of reduced cost greater than 2n can 
be in the MPM, thus justifying their exclusion from the matching subroutine. This 
leads to our main result. 

Theorem 4.3.2 Let algorithm A be a randomized parallel algorithm for MPM that 
uses Cf(n,m) processors and O(\og k n) time, where f(n,m) is a polynomial in n 
and m and k is a non-negative integer. Using algorithm Assignment we can convert 
algorithm A into an algorithm for MPM that uses nf(n, m) + M(n) processors and 
O((log fc n + log 2 n)logC) time. 

Proof: First we must verify that our algorithm actually finds a MPM. From Lemma 
4.3.1, we see that ignoring edges of reduced cost greater than 2ra does not change 
the value of the MPM. Therefore, at each step we find a valid MPM with respect to 
the reduced costs. Because an MPM with respect to the reduced costs has the same 
value as an MPM with respect to the actual costs, in the last iteration we really are 
finding an MPM in the graph where the current edge costs are the same as the edge 
costs of the input graph, thus proving correctness. To derive the resource bounds, 
observe that whenever we find a MPM in step 5, C < In. Procedure Compute Zero- 
Tight Duals is dominated by the shortest path computation that takes O(log 2 n) 
time on M{n) processors. All other steps in the algorithm can be implemented in 
constant time on 0(m + n) processors. Combining these observations with the fact 
that there are only logC iterations of the main loop, the theorem follows. ■ 

Corollary 4.3.3 

• Algorithm Assignment, combined with the matching algorithm of [30], yields a 
randomized parallel algorithm for computing an MPM using n 2 M{n) proces- 
sors and 0(log 3 nlogC) time. 

• Algorithm Assignment, combined with the matching algorithm of [38], yields 
a randomized parallel algorithm for computing an MPM using n 2 mM{n) pro- 
cessors in 0(log 2 nlogC) time. 
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Proof: Immediate from Theorem 4.3.2 and the algorithms in [30], [20], and [38]. ■ 
Observe that our algorithm performs less work in the case that C = £l(n 1+€ ) for 
some e > 0. Further, our algorithm outperforms the old algorithms by a factor of 
^( nlogg )' so as C S e ^ s l ar § er ? our algorithm becomes even more efficient than the 
previous algorithms. 

We can extend this algorithm for a minimum perfect matching to one that finds 
a minimum-cost (not necessarily perfect) matching. Let G = (V, E, c) be a graph 
in which we would like to find a minimum-cost matching. We employ the standard 
trick of using an augmented graph G' = (V,V X V,c') where d(v,w) — c(v,w) if 
(v,w) 6 E and d(v,w) = nC otherwise. It is easy to see that a minimum perfect 
matching in G' corresponds to a minimum-cost matching in G. 

Corollary 4.3.4 Given a graph G with maximum edge cost C, running algorithm 
Assignment on G' yields an algorithm that finds a minimum cost matching us- 
ing n 2 M(n) processors and 0(log 3 ralog(nC)) time or n 2 mM(n) processors and 
0(log 2 nlog(nC)) time. 



Chapter 5 

Conclusions and Open 
Problems 



We have presented a new technique for decomposing undirected graphs and have 
given one application: finding an approximation to the minimum cycle cover. We 
suspect that this technique will be useful for solving other problems as well. For 
example, observe that the maximal edge-disjoint cycles problem is closely related to 
a problem that arises in finding a minimum- cost circulation in a network, namely, 
finding a maximal set of weighted cycles in a positively-weighted directed graph. 
In this case, the weight of an edge represents the capacity of that edge, and the 
weight of a cycle represents the flow on that cycle. A maximal set of weighted 
cycles corresponds directly to a set of capacitated cycles such that, after flow is 
pushed around these cycles, the graph of edges that still have positive capacity is 
acyclic. Goldberg and Tarjan [22] solve the minimum-cost circulation problem by 
repeatedly finding a maximal set of weighted cycles; they show how to solve the 
latter problem sequentially in O(rologn) time. 

In view of the application to minimum-cost circulation, it is an important open 
problem to determine whether there is an efficient parallel algorithm for eliminating 
cycles in a weighted directed graph. At present, the most efficient parallel algorithm 
for this problem uses a reduction to weighted non-bipartite matching, which takes 
O(log 2 n) time on nmM(n) processors, and uses randomization[38]. 

We have also given an algorithm for the assignment problem that performs less 
work than the previously known RNC algorithms. It has the appealing feature of 
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having the number of processors be independent of the size of the edge costs. Actual 
parallel machines have a fixed number of processors. Therefore, this technique gives 
a way to solve assignment problems with arbitrarily large edge costs without having 
to resort to a machine with more processors. 

In contrast with previous algorithms, this algorithm only works for bipartite 
graphs. This is because the problem of finding tight dual variables in general graphs 
appears to be no easier than actually finding a matching, even sequentially [14]. 
However, finding dual variables is the only part of the algorithm that does not 
generalize to general graphs. 
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